147 research outputs found
Latent demographic profile estimation in hard-to-reach groups
The sampling frame in most social science surveys excludes members of certain
groups, known as hard-to-reach groups. These groups, or subpopulations, may be
difficult to access (the homeless, e.g.), camouflaged by stigma (individuals
with HIV/AIDS), or both (commercial sex workers). Even basic demographic
information about these groups is typically unknown, especially in many
developing nations. We present statistical models which leverage social network
structure to estimate demographic characteristics of these subpopulations using
Aggregated relational data (ARD), or questions of the form "How many X's do you
know?" Unlike other network-based techniques for reaching these groups, ARD
require no special sampling strategy and are easily incorporated into standard
surveys. ARD also do not require respondents to reveal their own group
membership. We propose a Bayesian hierarchical model for estimating the
demographic characteristics of hard-to-reach groups, or latent demographic
profiles, using ARD. We propose two estimation techniques. First, we propose a
Markov-chain Monte Carlo algorithm for existing data or cases where the full
posterior distribution is of interest. For cases when new data can be
collected, we propose guidelines and, based on these guidelines, propose a
simple estimate motivated by a missing data approach. Using data from McCarty
et al. [Human Organization 60 (2001) 28-39], we estimate the age and gender
profiles of six hard-to-reach groups, such as individuals who have HIV, women
who were raped, and homeless persons. We also evaluate our simple estimates
using simulation studies.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS569 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Reactive point processes: A new approach to predicting power failures in underground electrical systems
Reactive point processes (RPPs) are a new statistical model designed for
predicting discrete events in time based on past history. RPPs were developed
to handle an important problem within the domain of electrical grid
reliability: short-term prediction of electrical grid failures ("manhole
events"), including outages, fires, explosions and smoking manholes, which can
cause threats to public safety and reliability of electrical service in cities.
RPPs incorporate self-exciting, self-regulating and saturating components. The
self-excitement occurs as a result of a past event, which causes a temporary
rise in vulner ability to future events. The self-regulation occurs as a result
of an external inspection which temporarily lowers vulnerability to future
events. RPPs can saturate when too many events or inspections occur close
together, which ensures that the probability of an event stays within a
realistic range. Two of the operational challenges for power companies are (i)
making continuous-time failure predictions, and (ii) cost/benefit analysis for
decision making and proactive maintenance. RPPs are naturally suited for
handling both of these challenges. We use the model to predict power-grid
failures in Manhattan over a short-term horizon, and to provide a cost/benefit
analysis of different proactive maintenance programs.Comment: Published at http://dx.doi.org/10.1214/14-AOAS789 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Estimating spillovers using imprecisely measured networks
In many experimental contexts, whether and how network interactions impact
the outcome of interest for both treated and untreated individuals are key
concerns. Networks data is often assumed to perfectly represent these possible
interactions. This paper considers the problem of estimating treatment effects
when measured connections are, instead, a noisy representation of the true
spillover pathways. We show that existing methods, using the potential outcomes
framework, yield biased estimators in the presence of this mismeasurement. We
develop a new method, using a class of mixture models, that can account for
missing connections and discuss its estimation via the Expectation-Maximization
algorithm. We check our method's performance by simulating experiments on real
network data from 43 villages in India. Finally, we use data from a previously
published study to show that estimates using our method are more robust to the
choice of network measure
Bayesian Hierarchical Rule Modeling for Predicting Medical Conditions
We propose a statistical modeling technique, called the Hierarchical Association Rule Model (HARM), that predicts a patient’s possible future medical conditions given the patient’s current and past history of reported conditions. The core of our technique is a Bayesian hierarchical model for selecting predictive association rules (such as “condition 1 and condition 2 → condition 3”) from a large set of candidate rules. Because this method “borrows strength” using the conditions of many similar patients, it is able to provide predictions specialized to any given patient, even when little information about the patient’s history of conditions is available.National Science Foundation (U.S.) (NSF Grant IIS-10-53407)Google (Firm) (Ph.D. fellowship in statistics
- …